Genetic Network Programming with Reinforcement Learning and Its Performance Evaluation
نویسندگان
چکیده
A new graph-based evolutionary algorithm named “Genetic Network Programming, GNP” has been proposed. GNP represents its solutions as directed graph structures, which can improve the expression ability and performance. Since GA, GP and GNP already proposed are based on evolution and they cannot change their solutions until one generation ends, we propose GNP with Reinforcement Learning (GNP with RL) in this paper in order to search solutions quickly. Evolutionary algorithm of GNP makes very compact directed graph structure which contributes to reducing the size of the Q-table and saving memory. Reinforcement Learning of GNP improves search speed for solutions because it can use the information obtained during tasks. Recently, a new directed graph-based evolutionary algorithm named “Genetic Network Programming (GNP)[1]” has been proposed. Basically, GA and GP represent solutions as string and tree structures, respectively, but GNP represents its solutions as graph structures composed of a number of judgement nodes and processing nodes [Fig. 1]. Judgement nodes are if -then type branch decision functions, and Processing node determines an action/processing an agent should do. The GNP we used never causes bloat because of the predefined number of nodes although GNP can evolve the programs having variable sizes. The graph structure of GNP has some distinguished abilities inherently. 1) Since the graph structure of GNP inherently have the ability to re-use nodes unlike GA and tree GP, GNP can use certain Judgement/Processing nodes repeatedly to achieve tasks. Therefore, even if the number of nodes is predefined and smaller than GP programs, GNP can perform well by making effective programs based on re-using nodes. So we do not have to prepare the excessive number of nodes, as a result, the structures of GNP become very compact, which contributes toward saving the calculation time and physical memory. 2) GNP starts its node transition from a start node and continues its node transition according to the node connections without any terminal. So a current node is selected influenced by the past node transition. Therefore, GNP can have the implicit memory function which memorizes the past action sequences of agents in the network flow. The node transition ends when the end condition is satisfied, for example, the time step reaches the maximum one or the GNP program completes the given tasks. K. Deb et al. (Eds.): GECCO 2004, LNCS 3103, pp. 710–711, 2004. c © Springer-Verlag Berlin Heidelberg 2004 Genetic Network Programming with Reinforcement Learning 711 However, conventional GNP are based on evolution, i.e., after the programs are carried out to some extent, they are evaluated and evolved based on their fitness values, so many trials must be done again and again. To overcome this problem and search solutions quickly, we have proposed a new algorithm of GNP[2] which combines evolution and reinforcement learning (RL). Since RL is done when agents carry out their task, GNP can search for better solutions every judgement/processing besides the evolution executed every generation. The aim of the combination of RL and evolution is to take advantage of the sophisticated search abilities of evolution and online learning of RL. In this paper, a new algorithm of GNP with Reinforcement Learning is proposed. This method is a extension of the previous GNP, so it becomes more general framework of GNP with RL in terms that 1) we defined the new state and action pairs used by RL which are different from the previous method, 2) the proposed method can change node functions in addition to node connections and 3) save the number of parameters used by RL, so the size of the Q-table becomes small. In order to confirm the ability of the proposed method, the simulations using tileworld and maze problem which are the typical benchmark problems of agents are carried out, and the results of the proposed method are compared with those of standard GNP (GNP using evolution only) and standard tree GP. From the simulation results, it is clarified that the proposed method can obtain the best results and learn faster than the previous algorithm [Fig. 2, 3].
منابع مشابه
An Application of Genetic Network Programming Model for Pricing of Basket Default Swaps (BDS)
The credit derivatives market has experienced remarkable growth over the past decade. As such, there is a growing interest in tools for pricing of the most prominent credit derivative, the credit default swap (CDS). In this paper, we propose a heuristic algorithm for pricing of basket default swaps (BDS). For this purpose, genetic network programming (GNP), which is one of the recent evolutiona...
متن کاملA Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem
Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...
متن کاملNeural Programming and an Internal Reinforcement Policy
An important reason for the continued popularity of Artificial Neural Networks (ANNs) in the machine learning community is that the gradient-descent backpropagation procedure gives ANNs a locally optimal change procedure and, in addition, a framework for understanding the ANN learning performance. Genetic programming (GP) is also a successful evolutionary learning technique that provides powerf...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کامل